This report presents a comprehensive analysis of trends and patterns in students’ applications for postgraduate programs. We looked closely at a number of the variables affecting the application process using R Studio. The results offer significant perspectives for educational establishments seeking to improve their postgraduate admissions tactics.
1.1 Overview
1.2 Questions we imposed
1.3 Scope of Analysis
2.1 Overview of Dataset
2.2 Problems with the dataset
3.1 Data Sources
3.2 Variables Considered
3.3 Data Preprocessing
4.1 Libraries used in the process
4.2 Plots we have used
4.3 Plot of Average CGPA by University
4.4 Plot of Status Distribution for University of Washington
4.5 Plot of students changing their UG Major
4.6 Plot on Comaparasion between who have and have not changed their UG major
4.7 Plot of CGPA v/s TOEFL of IIT Kanpur
4.8 Plot on Admit v/s Reject based on UG Major of IIT Bombay
4.9 Admissions Pie Chart for University of Texas,Austin
4.10 Plot on TARGET.MAJOR v/s CGPA
The increasing competitiveness of postgraduate programs has made the process of choosing a good program at a reputable university difficult and requires a thorough understanding of application trends and patterns. This analysis aims to provide an insight on academic institutions to help applicants in their admission processes.
The study covers data of 100 students from IIT Bombay, IIT Delhi, IIT Madras, IIT Kanpur, IIT Kharagpur, IIT Roorkee, IIT Guwahati with different Degrees applying for Post-Graduation in different foreign Universities and then based on their CPI/CGPA, UG College and Degree, GRE and TOEFL/IELTS scores and work experience whether their application got accepted or rejected.
Our project consists of the data of various students from different Colleges and Degrees applying for Post-Graduation in Foreign Universities and then based on their CPI/CGPA, UG College and Degree, GRE and TOEFL scores and work experience whether their application got accepted or rejected.As of now, we have scraped data of these colleges : IIT Bombay, IIT Delhi, IIT Madras, IIT Kanpur, IIT Kharagpur, IIT Roorkee, IIT Guwahati For each college, we have scraped data of 100 students.
The analysis of the scraped dataset revealed notable challenges, primarily centered around data completeness and potential sample bias. A significant concern emerged from the observation that a mere 2% of applicants had provided IELTS scores and was completely not assigned to any particular university we got the data for which is why removing it did not affect the dataset.
The data with IELTS score column :
Moreover, it was evident that not all rejected applicants had entered their data, creating an incomplete representation of the entire applicant pool introducing the possibility of selective sample bias. As such, caution has been be exercised when drawing conclusions or making inferences based on this dataset, and efforts to address these data gaps should be prioritized to enhance the robustness and reliability of subsequent analyses. We have tried to solve the problems in the dataset by preprocessing it.
The libraries we used for scrapping are:
tidyverse
rvest
RSelenium
netstat
We used Chrome driver for web scraping to automate interactions with Chrome Browser utilizing the RSelenium library along with rvest in Rstudio.
We used netstat for getting information about network connections, routing tables, interface statistics, and other networking-related details.
The university in which the applicant has applied for the PG program.
Status denotes the acceptance.
The major for which the applicant has applied.
Academic semester in which the applicant has applied.
Score of GRE(Graduate Record Examination) consisting of score of three sections: Verbal reasoning, Quantitative reasoning and Analytical Writing.
Score of TOEFL/IELTS of the applicant.
The college from which the applicant has completed their UG program.
Under Graduate program of the applicant.
Cumulative Grade Point Average representing the average of the grade points obtained in all courses.
Number of research papers written.
Work Experience of the applicant.
We used dplyr library for preprocessing.
Removed the NULL rows from the scrapped data.
Scaled all the CGPAs to 10.
Made course baskets for target major courses:
Electrical & Computer Engineering, Computer Science, Computer Engineering, Computing Science, Applied Computing, Software Engineering, Information Management and Systems, Cyber Security, Computational Science & Engineering, Information Technology Management, Computer & Information Science, Information Technology, Computer Networks, Big Data, Information Systems under Computer.
Data Science, Data Analytics, Artificial Intelligence, Machine Learning, Robotics, Computational and Mathematical Engineering, Bioinformatics, Data Science and Business Analytics under AI_ML_DS
Electrical Engineering, EECS, Telecommunications Engineering under Electrical
Mechanical Engineering, Industrial Engineering, Industrial and Systems Engineering under Mechanical
Chemical Engineering, Chemical and Petroleum Engineering under Chemical
Civil Engineering, Civil & Environmental Engineering under Civil
Finance, Business Analytics, Business Analytics and Information Syste, MBA, Business Analytics Flex, Business Intelligence and Analytics under Business
Engineering Management, Information Management, Supply Chain Management, Management Science and Engineering under Management
Separated the TOEFL and IELTS score using appropriate condition and then removed the IELTS column.
Changed the names of the following columns:
Preprocessed data table looks like this :
shiny
shinythemes
shinydashboard
dplyr
ggplot2
ggrepel
plotly
DT
Scatter Plots
Bar Plots
Pie Charts
This bar plot visualizes the average CGPA (Cumulative Grade Point Average) for different universities. Each bar represents a university, and the height of the bar corresponds to the average CGPA of admitted students from that university. The universities are arranged in ascending order based on their average CGPA, allowing for a quick comparison of academic performance across institutions.
This pie chart visualizes the distribution of admission statuses (Admit and Reject) for applicants from the University of Washington. The Admit status is depicted in green, while the Reject status is shown in red, providing a clear visual differentiation.
This bar plot illustrates the count of occurrences for each target major, categorized by whether the undergraduate major (UG Major) differs from the target major (Target Major). Each bar represents a specific target major, sorted in descending order by count. The height of each bar corresponds to the frequency of occurrences, providing a visual representation of how many individuals experienced a change or remained consistent in their major from undergraduate to target.
This bar plot compares the count of individuals who either changed or did not change their major from undergraduate (UG) to their target major. The x-axis represents the major change status, with two bars: Changed and Did Not Change. The height of each bar corresponds to the frequency of occurrences, indicating how many individuals fall into each category.
The first plot depicts the relationship between CGPA and TOEFL scores for students from IIT Kanpur. Each point on the scatter plot represents an individual, with the color distinguishing between Accepted (green) and Rejected (red) statuses. This visualization provides an overview of how CGPA and TOEFL scores vary among applicants from IIT Kanpur, and the color differentiation aids in identifying the admission status of each individual.
The plot visualizes the distribution of admission statuses (Admit and Reject) based on undergraduate (UG) majors for students from IIT Bombay. Each bar represents a different UG major, and the height of the bar corresponds to the count of individuals falling into the respective Admit or Reject category. The x-axis displays various UG majors, and the bars are color-coded to differentiate between admission statuses. This bar plot provides insights into how admission decisions vary across different UG majors at IIT Bombay.
The plot is a pie chart that visualizes the distribution of admission statuses (“Admit” and “Reject”) for University of Texas, Austin based on a selected score type and a corresponding threshold. Each slice of the pie represents a different admission status, and the size of each slice corresponds to the count of individuals falling into the respective Admit or Reject category.
The plot is a scatter plot that compares the TARGET.MAJOR variable against a selected variable (CGPA in this case) for a specific undergraduate major (UG.MAJOR). Each point in the plot represents an observation, with the x-coordinate corresponding to the selected variable (CGPA) and the y-coordinate corresponding to the TARGET.MAJOR. The color of each point distinguishes between the admission statuses Admit (green) and Reject (red).
Most of the students preferred majors that are related to Computer Science, Artificial Intelligence, Machine Learning and Data Science.
Higher CGPA does matter in the process of admission if you aim for an IVY League University.
Having a paper or some work experience does improve your chances of Admit.
Even though there is a cutoff on GRE/TOEFL scores, but having a 320+/105+ significantly improves the chances of admit.
Data scraped from https://admits.fyi/.
R documentation was very useful for finding functions in base R.